10 research outputs found

    Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

    Get PDF
    Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented

    A social semantic approach to adaptive query expansion

    No full text
    Classic query expansion approaches are based on the use of two-dimensional co-occurrence matrices. In this paper, we propose the adoption of three-dimensional matrices, where the added dimension is represented by semantic classes (i.e., categories comprising all the terms that share a semantic property) related to the folksonomy extracted from social bookmarking services, such as Delicious and StumbleUpon. The results of an in-depth experimental evaluation performed on real users show that our approach outperforms traditional techniques, so confirming the validity and usefulness of the categorization of the user needs and preferences in semantic classes
    corecore